-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mango: fix $beginsWith range #4828
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add more unit test cases to cover strings of invalid UTF-8 data.
@pgj I'm trying to think where it would make sense to add unit tests for non-UTF8 data. There are no existing eunit tests for |
If you implement the pre-check that I suggested then it can be established for When I experimented with this fix locally, I extended diff --git a/src/mango/src/mango_selector.erl b/src/mango/src/mango_selector.erl
index 93d3b10ca..24660e963 100644
--- a/src/mango/src/mango_selector.erl
+++ b/src/mango/src/mango_selector.erl
@@ -136,7 +136,10 @@ norm_ops({[{<<"$text">>, Arg}]}) when
norm_ops({[{<<"$text">>, Arg}]}) ->
?MANGO_ERROR({bad_arg, '$text', Arg});
norm_ops({[{<<"$beginsWith">>, Arg}]} = Cond) when is_binary(Arg) ->
- Cond;
+ case couch_util:validate_utf8(Arg) of
+ true -> Cond;
+ false -> ?MANGO_ERROR({bad_arg, '$beginsWith', Arg})
+ end;
% Not technically an operator but we pass it through here
% so that this function accepts its own output. This exists
% so that $text can have a field name value which simplifies With that, the existing definition of |
caf1e70
to
b2ebc29
Compare
src/mango/src/mango_selector.erl
Outdated
% invalid (prefix is not a utf8 string) | ||
?assertThrow( | ||
{mango_error, mango_selector, {invalid_operator, <<"$beginsWith">>}}, | ||
check_beginswith(<<123>>, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<<123>>
is the field name here not the prefix. Is this a mistake...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my mistake - corrected
b2ebc29
to
09da2d5
Compare
cc4fa13
to
0142ed8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Full CI tests passed https://ci-couchdb.apache.org/blue/organizations/jenkins/jenkins-cm1%2FFullPlatformMatrix/detail/jenkins-mango-begingswith-fixes-2/1/pipeline but see the minor test fix comment. |
In the intial implementation of $beginsWith, the range calculation for json indexes mistakenly appends an integer with the size of 8 bits which gets maxed out at FF, rather than building a binary with an extra 3 bytes at the end. This commit fixes the `mango_idx_view:range/5` by correctly appending the `U+FFFF` code point to create a utf-8 encoded binary. Additionally, the Erlang `utf8` binary type ensures the result is a valid utf8 string. If `Arg` is not a utf8 binary, this will throw a badarg error. We expect `Arg` strings to be a valid utf8 but, to be safe, `mango_selector:norm_ops/1` is enhanced to verify that any argument to `$beginsWith` is a utf8 string.
0142ed8
to
586aaec
Compare
Overview
In the initial implementation of
$beginsWith
, the range calculation for view indexes mistakenly appends an integer with the size of 8 bits which gets maxed out at FF, rather than building a binary with an extra 3 bytes at the end.This PR fixes the
mango_idx_view:range/5
by correctly appending theU+FFFF
code point to create a utf-8 encoded binary. Additionally, the Erlangutf8
binary type ensures the result is a valid utf8 string. IfArg
is not a utf8 binary, this will throw a badarg error.U+FFFF
is used instead ofU+10FFFF
as it's the highest sorting code point according to the collator rules.We expect
Arg
strings to be a valid utf8 but, to be safe,mango_selector:norm_ops/1
is enhanced to verify that any argument to$beginsWith
is a utf8 string.Testing recommendations
make mango-test
Related Issues or Pull Requests
Initial implementation: #4810
Checklist
rel/overlay/etc/default.ini
src/docs
folder